繁體版 English
登录 注册

chinese corpus

"chinese corpus"的翻译和解释

例句与用法

  • Part - of - speech tagging is a fundamental theme in natural language processing . it is significant to the tagging of chinese corpus - based , machine translation and information indexing of large scale text
    词性标注是自然语言处理中的一项基础性课题,词性标注的正误对汉语语料库标注、机器翻译和大规模文本的信息检索等都有重要的意义。
  • I divide maximal crossing ambiguities into three sorts based on statistics of maximal crossing ambiguities from a large scale chinese corpus and adopt different methods to deal with them . this modified algorithm improves the ability to deal with maximal crossing ambiguities greatly
    本课题在对大规模真实文本中的最大交集字段进行统计的基础上,将最大交集字段分为三类,并分别对其进行处理,极大的提高了对最大交集字段的处理能力。
  • The performance of the existing word segmenters for dealing with this type of ambiguity is still not satisfactory . in the paper , cross ambiguities in chinese running texts are described quantitatively and systematically based on the basis of observations from a chinese corpus with 100m chinese characters
    本文根据一个1亿字的大型汉语语料库和一个包含1 1万词的汉语词表,对交集型歧义切分字段进行了穷举式的调查以及多角度、多层次的统计分类。
  • The building of corpus is the basic work in the area of chinese information processing . the processing of chinese corpus includes chinese word segmentation and part - of - speech tagging . they are widely used in many researches ( for example , the automatic searching of chinese text , machine translation , and chinese characters identification and so on ) , and they provide important study resources for these researches
    自动分词和词性标注在很多现实应用(中文文本的自动检索、过滤、分类及摘要,中文文本的自动校对,汉外机器翻译,汉字识别与汉语语音识别的后处理,汉语语音合成,以句子为单位的汉字键盘输入,汉字简繁体转换等)中都扮演着关键角色,为众多基于语料库的研究提供重要的资源和有力的支持。
  • The problem is critical since , in the classical chinese corpus developed by academic sinica in the past 14 years , there are more than 9 , 600 chinese characters without appropriate codes . in this paper , we present a database of chinese graphemes through which the structure of any missing characters as well as their attributes can be represented
    目前,对于继承汉文化的地区来说,缺字问题已是一个共同的梦魇,凡是遇到汉字的人名、地名、史料等等,都有相当严重的缺字问题;所以,缺字问题已是一个国际性大家都关心的问题。
  • Firstly , for the errors of text ’ character and word , utilizing neighborship of character or word , check character and word errors by character string co - occurrence probability . secondly , for the errors of syntax of text , according to statistic and analysis of a large - scale contemporary chinese corpus , recognize the predicate focus word and the others sentence ingredient , check the syntax errors . thirdly , for the errors of text ’ semanteme , establishing semantic dependency relationship tree based on hownet knowledge , presents a method that based on semantic dependency relationship analysis to compute sentence similarity , check the semantic errors
    对于文本字词错误的检查,本文主要利用了字词二元接续关系,根据同现概率检查文本字词错误;对于文本语法错误的检查,本文利用教研室已有的一个大规模语料库,通过对语料库进行统计分析,获得语法查错所需要的语言规律和知识,利用谓语中心词识别和其他句子成分识别的方法,检查文本语法结构上的错误;对于文本语义错误的检查,本文主要利用知网知识得到语义依存树,通过对句子的有效搭配对的相似度计算检查语义错误。
  • Abstract : based on the statistical characteristics of chinese maximal noun phrases ( mnps ) in a chinese corpus with 5 573 sentences , two efficient identifying algorithms for chinese mnps : ( 1 ) to identify mnps by using boundary distribution probabilities ; ( 2 ) to identify mnps by using internal structure rules , are proposed in this paper . experimental results show better performances : precision 85 . 4 % and recall 82 . 3 % , by using identifying algorithm ( 2 )
    文摘:通过对包含5573个汉语句子的语料文本中的最长名词短语的分布特点的统计分析,提出了两种有效的汉语最长名词短语自动识别算法:基于边界分布概率的识别算法和基于内部结构组合的识别算法.实验结果显示,后者的识别正确率和召回率分别达到了85 . 4 %和82 . 3 % ,取得了较好的自动识别效果
  • 更多例句:  1  2
用"chinese corpus"造句  
英语→汉语 汉语→英语